Multi-Criteria Comparison of Coevolution and Temporal Difference Learning on Othello

نویسندگان

  • Wojciech Jaskowski
  • Marcin Grzegorz Szubert
  • Pawel Liskowski
چکیده

We compare Temporal Difference Learning (TDL) with Coevolutionary Learning (CEL) on Othello. Apart from using three popular single-criteria performance measures: i) generalization performance or expected utility, ii) average results against a hand-crafted heuristic and iii) result in a head to head match, we compare the algorithms using performance profiles. This multi-criteria performance measure characterizes player’s performance in the context of opponents of various strength. The multi-criteria analysis reveals that although the generalization performance of players produced by the two algorithms is similar, TDL is much better at playing against the strong opponents, while CEL copes better against the weak ones. We also find out that TDL produces less diverse strategies than CEL. Our results confirm the usefulness of performance profiles as a tool for comparison of learning algorithms for games.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Neuro-Evolution Through Augmenting Topologies Applied To Evolving Neural Networks To Play Othello

Many different approaches to game playing have been suggested including alpha-beta search, temporal difference learning, genetic algorithms, and coevolution. Here, a powerful new algorithm for neuroevolution, Neuro-Evolution for Augmenting Topologies (NEAT), is adapted to the game playing domain. Evolution and coevolution were used to try and develop neural networks capable of defeating an alph...

متن کامل

Learning to Play Othello with N -Tuple Systems

This paper investigates the use of n-tuple systems as position value functions for the game of Othello. The architecture is described, and then evaluated for use with temporal difference learning. Performance is compared with previously developed weighted piece counters and multi-layer perceptrons. The n-tuple system is able to defeat the best performing of these after just five hundred games o...

متن کامل

The performance profile: A multi-criteria performance evaluation method for test-based problems

In test-based problems, solutions produced by search algorithms are typically assessed using average outcomes of interactions with multiple tests. This aggregation leads to information loss, which can render different solutions apparently indifferent and hinder comparison of search algorithms. In this paper we introduce performance profile, a generic, domainindependent, multi-criteria performan...

متن کامل

Effect of look-ahead search depth in learning position evaluation functions for Othello using epsilon-greedy exploration

This paper studies the effect of varying the depth of look-ahead for heuristic search in temporal difference (TD) learning and game playing. The acquisition position evaluation functions for the game of Othello is studied. The paper provides important insights into the strengths and weaknesses of using different search depths during learning when 2-greedy exploration is applied. The main findin...

متن کامل

The Significance of Temporal-Difference Learning in Self-Play Training TD-Rummy versus EVO-rummy

Reinforcement learning has been used for training game playing agents. The value function for a complex game must be approximated with a continuous function because the number of states becomes too large to enumerate. Temporal-difference learning with self-play is one method successfully used to derive the value approximation function. Coevolution of the value function is also claimed to yield ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014